arXiv:1501.01497vl [math.OC] 7 Jan 2015 


AN OPTIMAL SUBGRADIENT ALGORITHM FOR LARGE-SCALE 
BOUND-CONSTRAINED CONVEX OPTIMIZATION 

MASOUD AHOOKHOSH* AND ARNOLD NEUMAIERt 

Abstract. This paper shows that the OSGA algorithm - which uses first-order information 
to solve convex optimization problems with optimal complexity - can be used to efficiently solve 
arbitrary bound-constrained convex optimization problems. This is done by constructing an explicit 
method as well as an inexact scheme for solving the bound-constrained rational subproblem required 
by OSGA. This leads to an efficient implementation of OSGA on large-scale problems in applications 
arising signal and image processing, machine learning and statistics. Numerical experiments demon¬ 
strate the promising performance of OSGA on such problems, ions to show the efficiency of the 
proposed scheme. A software package implementing OSGA for bound-constrained convex problems 
is available. 

Key words, bound-constrained convex optimization, nonsmooth optimization, first-order black¬ 
box oracle, subgradient methods, optimal complexity, high-dimensional data 
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1. Introduction. Let V be a finite-dimensional real vector space and V* its dual 
space. In this paper we consider the bound-constrained convex minimization problem 

min f(Ax) ^ 1 . 

s.t. x € x, ^ ‘ 

where / : x -> K is a - smooth or nonsmooth - convex function, A : R” —> R m 
is a linear operator, and x = [x,x\ is an axiparallel box in V in which x and x are 
the vectors of lower and upper bounds on the components of x , respectively. (Lower 
bounds are allowed to take the value — oo, and upper bounds the value +oo.) 

Throughout the paper, ( g , x) denotes the value of g £ V* at x € V. A subgradient 
of the objective function / at x is a vector g(x) G V* satisfying 

f{z) > f(x) + {.g{x),z-x), 

for all z £ V. It is assumed that the set of optimal solutions of 0 is nonempty 
and the first-order information about the objective function (i.e., for any x £ x, the 
function value f(x) and some subgradient g{x) at x) are available by a first-order 
black-box oracle. 

Motivation &: history. Bound-constrained optimization in general is an important 
problem appearing in many fields of science and engineering, where the parameters 
describing physical quantities are constrained to be in a given range. Furthermore, it 
plays a prominent role in the development of general constrained optimization meth¬ 
ods since many methods reduce the solution of the general problem to the solution of 
a sequence of bound-constrained problems. 

There are lots of bound-constrained algorithms and solvers for smooth and nons¬ 
mooth optimization; here, we mention only those related to our study. Lin & More 
in [55] and Kim et al. in [53] proposed Newton and quasi-Newton methods for solv¬ 
ing bound-constrained optimization. In 1995, Byrd et al. mi proposed a limited 

* Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090 Vienna, Austria, 
(masoud.ahookhosh@univie.ac.at) 

t Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090 Vienna, Austria. 
(Arnold.Neumaier@univie.ac.at) 


1 



2 


M. Ahookhosh and A. Neumaier 


memory algorithm called LBFGS-B for general smooth nonlinear bound-constrained 
optimization. Branch et al. in [T3j proposed a trust-region method to solve this prob¬ 
lem. Neumaier & Azmi [35] solved this problem by a limited memory algorithm. 
The smooth bound-constrained optimization problem was also solved by Birgin et 
al. in [9’ and Hager & Zhang in [25, 26| using nonmonotone spectral projected 
gradient methods, active set strategy and affine scaling scheme, respectively. Some 
limited memory bundle methods for solving bound-constrained nonsmooth problems 
were proposed by Karmitsa & Makela [201 EH] ■ 

In recent years convex optimization has received much attention because it arises 
in many applications and is suitable for solving problems involving high-dimensional 
data. The particular case of bound-constrained convex optimization involving a 
smooth or nonsmooth objective function also appears in a variety of applications, 
of which we mention the following: 


Bound-constrained linear inverse problems. Given A,W £ R mxn , b £ 

and A £ R, for m > n, the bound-constrained least-squares problem is given by 

1 , 


min f(x) := A || Ax — 6||| + \^{x) 

S.t. £ € X, 

and the bound-constrained l\ problem is given by 

min f(x) := \\Ax — 6||i + Xip(x) 
s.t. x £ x, 


( 1 . 2 ) 


(1.3) 


where x = [x , x\ is a box and ip is a smooth or nonsmooth regularizes often a weighted 
power of a norm; see Section |4| fo r examples. 

The problems (1.2) and ( |l.3[ ) are commonly arising in the context of control and 
inverse problems, especially for some imaging problems like denoising, deblurring and 
inpainting. Morini et al. [23] formulated the bound-constrained least-squares prob¬ 


lem (1.2) as a nonlinear system of equations and proposed an iterative method based 
on a reduced Newton’s method. Recently, Zhang & Morini [39] used alternating 
direction methods to solve these problems. More recently, Chan et al. in [IB], Bo? 
et al. in m, and Bo? & Hendrich in m proposed alternating direction methods, 
primal-dual splitting methods, and a Douglas-Rachford primal-dual method, respec¬ 


tively, to solve both (1.2) and (1.3) for some applications. 


Content. In this paper we show that the optimal subgradient algorithm OSGA pro¬ 
posed by Neumaier in [33] can be used for solving bound-constrained problems of the 


form (1.1). In order to run OSGA, one needs to solve a rational auxiliary subproblem. 


We here investigate efficient schemes for solving this subproblem in the presence of 
bounds on its variables. To this end, we show that the solution of the subproblem has 
a one-dimensional piecewise linear representation and that it may be computed by 
solving a sequence of one-dimensional piecewise rational optimization problems. We 
also give an iterative scheme that can solve OSGA’s subproblem approximately by 
solving a one-dimensional nonlinear equation. We give numerical results demonstrat¬ 
ing the performance of OSGA on some problems from applications. More specifically, 


in the next section, we investigate properties of the solution of the subproblem (2.1) 


that lead to two algorithms for solving it efficiently. In Section 3, we report numerical 
results with artificial and real data, where to solve large-scale imaging problems we 
use the inexact scheme to find the solution of OSGA’s subproblem. Finally, Section 
4 gives some conclusions. 
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2. Overview of OSGA. OSGA (see Algorithm 1) is an optimal subgradient 
algorithm proposed by Neumaier in [33] that constructs - for a problem (1.1) with 


arbitrary nonempty, closed and convex domain, not necessarily a box - a sequence of 
iterates whose function values converge with the optimal complexity to the minimum 
of (1.1). Furthermore, OSGA requires no information regarding global parameters 


like Lipschitz constants of function values and gradients. 

Apart from first-order information at the iterates, OSGA requires an efficient 
routine for finding a maximizer x = U("f, h ) and the optimal objective value E( 7 , h) 
of an auxiliary problem of the form 


sup 

s.t. 


E. 


7 ,h (•£) 
X £ x, 


( 2 . 1 ) 


where it is known that the supremum is positive. The function E 1 h '■ x —> R is defined 

by 


-^7 , h { p ^) 


7 +{h,x) 
Q(x) 


( 2 . 2 ) 


with 7 £ R, h £ V*, and Q : x — > R is a prox-function, i.e., a strongly convex function 
satisfying inf^x Q(x) > 0 and 


Q{z) > Q(x) + (g Q (x ), z - x) + °^\\z - x\\ 2 , (2.3) 

where the convexity parameter is <7 = 1. In particular, with Q 0 > 0, if x £ R n , we 
may take 

Q{x) := Q 0 + i||ai - if x G R", (2.4) 

where ||x ||2 = \/X)i X 1 the Euclidean norm, and 

Q(x):=Q 0 + ^\\x-x°\\ 2 f if * G R mxn , (2.5) 


where ||x||i? = x lk i s tbe Frobenius norm. 


In pnI33j. the unconstrained version (2.1) with the prox-function (2.4) or (2.5) is 
solved in a closed form. Numerical experiments and comparisons with some state-of- 
the-art solvers E0III show the promising behaviour of OSGA for solving practical 
unconstrained problems. A version of OSGA that can solves structured nonsmooth 
problems with the complexity 0(e -1 / 2 ) discussed in [1]. In [3], it is shown that the 
OSGA subproblem can be solved for many simple domains. In this paper we show 
that for the above choices of Q(x) and an arbitrary box x, the subproblem (2.1) can 
be solved efficiently. It follows that OSGA is applicable to solve bound-constrained 
convex problems as well. 

It is shown in Neumaier [33] that OSGA satisfies the optimal complexity bounds 
O (e -2 ) for Lipschitz continuous nonsmooth convex functions and O (e -1 / 2 ) for smooth 
convex functions with Lipschitz continuous gradients; cf. Nemirovsky & Yudin m 
and Nesterov [31 1 132]. Since the underlying problem ( | 1 . 1 | ) is a special case of the 
problem considered in [33] . the complexity of OSGA remains valid for (1.1). 

As discussed in [33], OSGA uses the following scheme for updating the given pa¬ 
rameters a , h, 7 , 77, and u: 
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Algorithm 1: PUS (parameters updating scheme) 

Input: 5, a max G ] 0 , 1 [, 0 < k' < k, a, 77, h, 7 , fj, u; 
Output: a, h, 7 , 77, u; 

begin 

R^- ( rj-rj)/(6ar 7 ); 

if R < 1 then 

h i — h ; 

else 

a G- min(ae K ' (i?_1) , a max ); 

end 

cn ^— 015 

if 77 < 77 then 

h G- hf, 7 G- 7 ; 77 G- 77; u <— u\ 

end 

end 


Algorithm 2: OSGA (optimal subgradient algorithm) 

Input: global parameters: S, a max G ]0,1[, 0 < n' < k; local parameters: 

Xo • 1 1 G 0, /target ■ 

Output: x b , f Xb \ 
begin 

choose an initial best point 07 ; 
compute f Xb and 

if fx b < /target then 

stop; 

else 

h = g Xb ~ WQ(x h )-, 7 = f Xb - gQ(xb) ~ (h,x b )\ 
lb = l~fx b \ u = U ( 7 b, /i); 9 = E(i b ,h) - 

end 

ct t o max , 

while stopping criteria do not hold do 
x = x b + a{u — x b ); compute / x and g x ; 
g = g x - ggQ(x)-, h = h + a(g - h); 

7 = 7 + a(f x - /t< 3 (x) - (g, x) - 7 ); 

x'b = argmin z£{;C!)iX} f(z, 7+)| / x > = min{/ Xi) , / x }; 

lb ~1 ~ fx b ; u'= U('y' b ,h); 

x' = Xb + a(i/ — Xb); compute f x /\ 

choose Xb in such a way that f Xb < minj/^a, f x '}; 

lb = 1 ~ fx b I u = U{l b ,h ); 77 = E(l b ,h) - p; x b =x b \ f Xb =fx b \ 

if fx b < /target then 

stop; 

else 

update the parameters a, h, 7 , 77 and rt; 

end 

end 

end 
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3. Solving OSGA’s rational subproblem ( |2.1[ ). In this section we investi¬ 
gate the solution of the bound-constrained subproblem <H 1 ) and give two iterative 
schemes, where the first one solves ( 2 . 1 ) exactly and the second one solves it approx¬ 
imately. 


3.1. Explicit solution of OSGA’s rational subproblem (2.1). In this sub¬ 


section we describe an explicit solution of the bound-constrained subproblem ( 2 . 1 ). 

Without loss of generality, we here consider V = K n . It is not hard to adapt 
the results to V = R nx ” and other spaces. The method is related to one used in 
several earlier papers. In 1980, Helgason et al. E?J characterized the solution of 
singly constrained quadratic problem with bound constraints. Later, Paradolas 
& Kovoor m developed an 0(n) algorithm for this problem using binary search 
to solve the associated Kuhn-Tucker system. This problem was also solved by Dai 
& Fletcher m using a projected gradient method. Zhang et al. [40] solved the 
linear support vector machine problem by a cutting plane method employing a similar 
technique. 

In the papers mentioned, the key is showing that the problem can be reduced to 
a piecewise linear problem in a single dimension. To apply this idea to the present 


problem, we prove that ( 2 . 1 ) is equivalent to an one-dimensional minimization problem 


and then develop a procedure to calculate its minimizer. We write 

a;(A) := sup{a:, inf{a ; 0 — A h,x}} 
for the projection of x° — Xh to the box x. 


(3.1) 


Proposition 3.1. For h ^ 0, the maximum of the subproblem (2.1) is attained 


at x := x(X), where A > 0 or X = +00 is the inverse of the value of the maximum. 

Proof. The function E 7j h : V —> R defined by (2.2) is continuously differentiable 
and from Proposition 2.1 in [33] we have e := E > 0. By differentiating both side 
of the equation E 1 y l (x)Q(x) = —7 — (h,x), we obtain 


dE 1 , h 

dx 


Q(x) = —e[x — x°) — h. 


At the maximizer x, we have eQ(x ) = —7 — (h,x). Now the first-order optimality 
conditions imply that for i = 1 , 2 ,... ,n, 


< 0 


= 0 


if Xi = x t , 


— e(xi — xlf) — hi l >0 if Xi = 


(3.2) 


if x { < Xi < Xi. 


Since e > 0, we may define A := e 1 and find that, for i = 1,2,..., n, 


Xi — \ Xi 


- Xhi 


if ^ > ar° — Xhi, 
if Xi < x^ — Xhi, 
if x< < — Xhi < Xi. 


(3.3) 


This implies that x = x(X). □ 


Proposition 3.1 gives the key feature of the solution of the subproblem (2.1) 
implying that it can be considered in the form of © with only one variable A. In 
the remainder of this section, we focus on deriving the optimal A. 

Example. 3.1. Let first consider a very special case thatx is nonnegative orthant, 
i.e., x_i = 0 and Xi = +00 , for i = 1, • • • ,n. Nonnegativity constraint is important 
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in many applications, see 0 E3 E3 H3 G2f. In this case we consider the quadratic 
function 


Q(z) -\\z\\l + Q 0 , 


(3.4) 


where Q o > 0. In m, it is proved that ( f 3.J\ ) is a prox-function. By using this prox- 
function and \3.1\), we obtain 


x(X) = sup{x, inf{— Xh, x} = —A(ft)_, 


where (z)_ = min{0, z}. By Proposition 2.2 of W3\ I. we have 

^ ^9 IkMHl + + 7 + {h,x( A)) = ^||(ft) -111 + (ft,®(A))^ A 2 + 7 A + Q 0 = 0 . 


By substituting A = 1/e into this equation, we get 


Pie + P 2 X + /?3 = 0, 


where Pi = Q 0 , P 2 = 7, and P 3 = ^||(ft)_||| — (ft, (ft)-. Since we search for the 
maximum e, the solution is the bigger root of this equation, i.e., 


_ -Pi + VP 2 ~4PiP 3 
2 Pi 

This shows that for the nonnegativity constraint the subproblem j[2.1\ ) can be solved 
in a closed form, however, for a general bound-constrained problem, we need a much 


more sophisticated scheme to solve (2.1). 


To derive the optimal A > 0 in Proposition 0T1 we first determine its permissible 


range provided by the three conditions considered in (3.31 leading to the interval 


A € [Ai, A*], 


(3.5) 


for each component of x. In particular, if hi = 0, since x° is a feasible point, Xi = 
x / — Aftj = x / satisfies the third condition in (3.3). Thus there is no upper bound for 
A, leading to 


X_i =0, A i = +00 if Xi = x^, hi = 0. 


(3.6) 


If hi ^ 0, we consider the thre e cases (i) x t > x ® — Xhi, (ii) Xi < x ° — A hi, and 
(iii) x t < x^ — Xhi < Xi of (3.3). In Case (i), if h t < 0, division by h t implies that 
A < —{Xj — Xi)/hi < 0, which is not in the acceptable range for A. In this case, if 
hi > 0 then A > — (a^ — x®)/hi leading to 


x_i~x° 

h 


, Xi = +00 if Xi = Xi, hi > 0. 


(3.7) 


In Case (ii), if hi < 0 then A > —{xi — x®)/hi implying 


A i = ———-, A i = +00 if Xi = Xi, hi < 0. 


(3.8) 
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In Case (ii), if h t > 0 then A < — (x* — x®)/ hi < 0, which is not in the acceptable 
range of A. In Case (iii), if hi < 0, division by hi implies 

rp rp . rp 0 

— 11 < A <--— 

hi hi 

The lower bound satisfies ~(x { — x°)/hi < 0, so it is not acceptable, leading to 

A i = 0, A i = - Xl Xi if Xi = x° - A hi G [x^x* ], h t < 0. (3.9) 

lli 

In Case (iii), if hi > 0, then 


rp . _ rp _ rp 0 

**' 4 , , - •*'4 **'4 

\ A \ — 


hi hi 

However, the lower bound — (x$ — x®)/hi < 0 is not acceptable, i.e., 


A, — 0, Xi — — - 


if Xi = x° — A hi G [XjXi], hi > 0. 

Hi 

As a result, the following proposition is valid. 

Proposition 3.2. If x{ A) is solution of the problem (2.1), then 

A G [Aj, A*] i = !,■■■ , n, 

where X t and Xi are computed by 


(3.10) 


Xi - xy 


A j = < 


o 
0 
0 


if Xi = Xp hi > 0, 

if Xi = Xi, hi < 0, T 

~ _ Aj — 

if Xi G [XjXj], hi < 0, 

if Xi G [xjXj], hi > 0, 
if hi = 0, 


Xi — x^ 


' +oo if Xi = Xi, hi > 0, 

+oo if Xi = Xi, hi < 0, 

if Xi G [XjXj], hi < 0, 

rii 

— x° 

——- if Xi G [x^Xj], hi > 0, 

lli 

„ +oo if hi = 0, 

(3.11) 


in which Xi = x® — Xhi for i = 1, ■ • ■ ,n. 

Proposition |3.2| implies that each element of x satisfies in only one of the conditions 
")) ( |3.10[ ). Thus, for each i = 1,... , n with hi 0, we have a single breakpoint 


Xi - Xi 


A, := 


+oo 


if hi < 0, 

if hi > 0, 
if hi = 0. 


(3.12) 


Sorting the n bounds A,, i = 1, in increasing order, augmenting the re¬ 

sulting list by 0 and +oo, and deleting possible duplicate points, we obtain a list of 
to + 1 < n + 2 different breakpoints, denoted by 


0 - Ai < A 2 < ... < A m < A m+ i — +oo. 


(3.13) 
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By construction, x(X) is linear in each interval [Afc, Afc+i], for k = 1, • • • , m. The next 
proposition gives an explicit representation for x(X). 

Proposition 3.3. The solution x(X) of the auxiliary problem (2.1) defined by 

(3.1) has the form 


where 


v\ = 


x(X)=p k + Xq k for A e [A fc , A fe+ i] (k = 1,2,..., m), 



>- 

II 

© 

i 

r 0 

o' 

II 

-c? 

«+H 


if Afc+i < A i, 

q k _ J 

1 -hi 

if Afc + i ft Aj, 

Xi 

if A k ^ A2, hi ^ 0, 

9* I 

1 0 

if A* > Aj, hi > 0, 

Xi 

if Afc ^ A2, hi 0, 


l 0 

if A* > Aj, hi < 0. 


(3.14) 


(3.15) 


Proof. Since A > 0, there exists k e {!,■■■ , to} such that A £ [Afc,Afc+i]. Let 


sign of hi, i mpl ying — A hi 
Proposition ! 


i £ {1,- ■ • ,n}. If hi = 0, (3.6) implies Xi = x®. If hi 0, the way of construction 
of Xi for i = 1, ■ ■ ■ , to implies that A,; g (Afc,Afc+i)> so two cases are distinguished: 
(i) Afc_|_i < Aj; (ii) Afc > Xi. In Case (i), Pro posit ion 3.2 implies that Aj = Aj, while 
it is not possible Xi X,. Therefore, either (3.9) or (3.10) holds dependent on the 

£ [Xi,Xi], so that p k = xi and q k = —hi. In Case (ii), 
implies that A i = Aj, whil e it is not possible A, 7 ^ Aj. Therefore, either 
fl3.7|) or (|3.8[) holds. If hi < 0, then (3.8) holds, i.e., p\ = Xi and q k = 0. Otherwise, 


3.2 


(3.7) holds, implying p k = x, and q( = 0. This proves the claim. □ 


Proposition 3.3 exhibits the solution x(X) of the auxiliary problem (2.1) as a 


piecewise linear function of A. In the next result, we show that solving the problem 


(2.1) is equivalent to maximizing a one-dimensional piecewise rational function. 

Proposition 3.4. The maximal value of the subproblem (2.1) is the maximum 
of the piecewise rational function e(A) defined by 


3 (A) := 


+ h k X 


c k + d k X + SfcA 2 


if X £ [Afc, Afc_|_i] (k — 1,2, 


*)> 


(3.16) 


where 


a k ■= -7 - (h,p k ), b k := -{h, q k ), 


Ck~Qo+\\\p k d k := (p k - x°,q k ), s k := ^\\q k \\ 2 . 


Moreover, c k > 0, c k > 0 and 4s k c k > d k . 

Proof. By Proposition 3.1 and 3.3 the global minimize r of (|2.1| ) has the form 
(3.14). For k = 1,2,...,to and A £ [Afc,Afc+i], we substitute ( 3.14[ ) into the function 

(2.2), and obtain 


E^, h {x( A)) = - 


7 + (h, x k (X)} = j+{h,p k +g k X) 
Q(x k ( A)) Q(p k + q k A) 

7 + (h,p k ) + (h,q k )X 


(3.17) 


Qo + b\\p k - £°|| 2 + ( p k - x°,q k )X + i||g fc || 2 A 2 


= e(A), 


as defined in the proposition. 
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Since Q o > 0, we have Ck > 0 and the denominator of (3.16) is bounded away 
from zero, implying 4> d 2 . It is enough to verify Sk > 0. The definition of qk in 
(3.15) implies that hi / 0 for i £ I = {* : Xk+i < A^} leading to q k 0 implying 
Sk > 0 . □ 

The next result leads to a systematic way to maximize the one-dimensional ra¬ 
tional problem (3.16). 

Proposition 3.5. Let a, b, c, d, and s be real constants with c > 0, s > 0, and 
4 sc > d?. Then 


m ■■= 


cl + bX 


c-\- d\-\- sX 2 


(3.18) 


defines a function <f> : R. —> R that has at least one stationary point. Moreover, the 
global maximizer of <fi is determined by the following cases: 


(i) If b ^ 0, then a 2 — b(ad— bc)/s > 0 and the global maximum 


is attained at 


A = 


4>(X) = —^- 

2sX + d 

—a + \Ja 2 — b(ad — be)/s 


(ii) 7/6 = 0 and a > 0, the global maximum is 

4 as 


</(A) = 


attained at 


A = - 


4cs — d 2 ’ 

d 


2 s' 


(3.19) 


(3.20) 


(3.21) 


(3.22) 


(in) 7/6 = 0 and a < 0, the maximum is = 0, attained at A = +00 for a < 0 
and at all A eK for a = 0. 

Proof. The denominator of (3.18) is positive for all A £ R" iff the stated condition 
on the coefficients hold. By the differentiation of <f and using the first-order optimality 
condition, we obtain 


4>\ A) = 


6(c + g?A + sA 2 ) — (a + 6A)(d + 2sA) bs\ 2 + 2asA + ad — be 


(c + dX + s A 2 ) 2 


(c + dX + s A 2 ) 2 


For solving <j>'(X) = 0, we consider possible solutions of the quadratic equation bsX 2 - 
2asA + ad — be = 0. Using the assumption 4sc > d 2 , we obtain 

(2as) 2 — 46s(ad — be) — (2 as) 2 — (Aabds — 46 2 cs) 

= (2as) 2 — Aabds — b 2 (d 2 — 4cs — d 2 ) 

= (2as) 2 — Aabds + (bd) 2 — 6 2 (d 2 — 4cs) 

> (2as — bd) 2 — b 2 (d 2 — 4cs) > 0, 


leading to 


2 b(ad — 6c) 


a — 


> 0 , 
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implying that d>'( A) =0 has at least one solution, 
(i) If b ± 0, then 


2 b(ad — be) 2 bd b 2 c 


a — 


bd 


= a 1 - a -- = [a — — + —(4sc — cr) > 0 


2s 


4s 2 


implying there exist two solutions. Solving <j>'( A) = 0, the stationary points of the 
function are found to be 


A = 


—a ± sja 2 — b(ad — bc)/s 


(3.23) 


Therefore, a + bX = ±w with 

w := \Ja 2 — b(ad — be)/s > 0, 

and we have 

m = ±w 


(3.24) 


c + dX + sX 2 

Since the denominator of this fraction is positive and w > 0, the positive sign in equa¬ 
tion (3.23) gives the maximizer, implying that (3.20) is satisfied. Finally, substituting 

b 2 w 

f s(w — a) 2 

b 2 w 


this maximizer into (3.24) gives 

m = 


c + dA + sA 2 b 2 c + bd{w - a) + s{w - a) 2 
b 2 w 

a 2 s — b(ad — be) + sw 2 + (bd — 2 as)w 2 sw 2 + (bd — 2 as)w 

b 2 w b 


;(2 s(w-a) + bd) 2sA + d’ 


so that ( |3.19[ ) holds. 

(ii) If b = 0, we obtain 


4'W = 


—a(d + 2sX) 

(c + dX + sX 2 ) 2 ' 


Hence the condition <f>’(X) = 0 implies that a = 0 or d + 2sX = 0. The latter case 
implies 

x —s- = 


whence A is a stationary point of cf>. If a > 0, its maximizer is A = — ^ an( l (3.21) is 
satisfied. 

(iii) If b = 0 and a < 0, then 

lim 4>(X) = lim 4>(X) = 0 

A— y —oo A—^ ~I - oo 

implies (f>(X) = 0 at A = ±oo. In case a = 0, <j>(X) = 0 for all AeK. □ 

We summarize the results of Propositions |3.1||3.5| into the following algorithm for 
computing the global optimizer Xb and the optimum of (2.1). 
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Algorithm 3: BCSS (bound-constrained subproblem solver) 

Input: Qo,x°,h,x,x-, 

Output: Ub = U("/,h),eb = e(xb)', 

begin 

for i = 1 ,2,. . ., n d o 

find A i by (3.12| using x and x\ 

end 


determine the breakpoints A k, k = 1, ..., m + 1, by (3.13 ); 
e b = 0; 

for k = 1, 2, ..., to do _ 

compute p k and q k u sing (|3.15[ ); 
construct e(A) using (3.166 lor [A*, Afc+i]; 


3.5 


find the maximizer A of e(A) using Proposition 
if A e [Afc, Afe + i] then 

compute e k = e(A) using Proposition 
A fc = A; 

else 

e k = max{e(A fc ),e(A fc+ i)}; 

A fc = argmax ie{fe)fc+1} {e(Ai)}; 

end 

E{k) = e k , LAM(k) = X k ; 


3.5 


end 


j = argrna x{E(i) \ i = 1, • • • , m}; 
eb = E(j), A = LAM(j), Ub = x° — Aft; 


end 


3.2. Inexact solution of OSGA’s rational subproblem (|2.1[). In this sub¬ 


section we give a scheme to compute an inexact solution for the the subproblem (2.1). 


We here use the quadratic prox-function (|3.4|). 
Theorem 3.1 in 


In view of Proposition 3.1 


and 


the solution of the subproblem (2.1) is given by x(X) defined in 
(3.1), where A can be computed by solving the one-dimensional nonlinear equation 

(p(X) = 0, 


in which 


^ (A) : = !Q l|:r(A)l1 ^ +0 ° 


■ 7 + (ft, x(X)). 


(3.25) 


The solution of OSGA’s subproblem can be found by Algorithm 3 (OSS) in 0- In 
1.3], it is shown that in many convex domains the nonlinear equation ( 3.25| ) can be 
solved explicitly, however, for the bound-constrained problems it can be only solved 
approximately. 

As discussed in [3], the one-dimensional nonlinear equation can be solved by some 
zero-finder schemes such as the bisection method and the secant bisection scheme 
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described in Chapter 5 of [35]. One can also use the MATLAB’s fzero function 
combining the bisection scheme, the inverse quadratic interpolation, and the secant 
method. In the next section we will use this inexact solution of OSGA’s rational 
subproblem (2.1) for solving large-scale imaging problems. 


4. Applications and numerical experiments. In this section we report some 
numerical results to show the performance of OSGA compared with some state-of- 
the-art algorithms on both artificial and real data. 

A software package implementing OSGA for solving unconstrained and bound- 
constrained convex optimization problems is publicly available at 

http://homepage.univie.ac.at/masoud.ahookhosh/. 

The package is written in MATLAB, where the parameters 


S = 0.9; 


= 0.7; k = k' = 0.5; T 


target 


= —OO 


are used. We also use the prox-function (2.4) with Q 0 = ^||a^o ||2 + e, where e is the 
machine precision. Some examples for each class of problems are available to show 
how the user can implement it. The interface to each subprogram in the package is 
fully documented in the corresponding file. Moreover, the OSGA user’s manual |2j 
describes the design of the package and how the user can solve his/her own problems. 

The algorithms considered in the comparison use the default parameter values 
reported in the corresponding papers or packages. All implementations were executed 
on a Toshiba Satellite Pro L750-176 laptop with Intel Core i7-2670QM Processor and 
8 GB RAM. 

4.1. Experiment with artificial data. In this subsection we deal with solving 


the problem (1.1) with the objective functions 


/O) = h\ Ax ~ ^lll + \ 

f(x) = |||Ax - b\' 2 
f{x) = \\Ax- 6||i 
fix) = 11 Ax — 6||i + ||x||i 


x 


2 11-'“' "112 + IfIIi 

INI! 


(L22L22R), 

(L22L1R), 

(L1L22R), 

iLlLlR). 


(4.1) 


The problem is generated by 


[A, z, x] = i_laplace(n), b = z + 0.1 * rand, 


where n is the problem dimension and i_laplace.m is a code generating an ill-posed 
test problem using the inverse Laplace transformation from the Regularization Tools 
MATLAB package, which is available at 

http://www.imm.dtu.dk/~pcha/Regutools/. 

The upper and lower bounds on variables are generated by 


x = 0.05 * ones(n), x = 0.95 * ones(n), 


respectively. Since among the problems (4.1) only L22L22R is differentiable, we need 
some nonsmooth algorithms to be compared with OSGA. In our experiment we con¬ 
sider two versions of OSGA, i.e., a version uses BCSS for solving the subproblem (2.1) 
(OSGA-1) and a version uses the inexact solution described in Subsection 3.2 for solv¬ 
ing the subproblem (2.1) (OSGA-2), compared with PSGA-1 (a projected subgradient 
algorithm with nonsummable diminishing step-size), and PSGA-2 (a projected sub¬ 
gradient algorithm with nonsummable diminishing steplength), see [12] . 
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We solve all of the above-mentioned problems with the dimensions n = 2000 and 
n = 5000. The results for L22L22R and L22L1R are illustrated in Table 1 and Figure 

1, and the results for L1L22R and L1L1R are summarized in Table 2 and Figure 

2. More precisely, Figures 1 and 2 show the relative error of function vales versus 
iterations 

4 := (fk - 7)/(/o - 7), (4.2) 

where / denotes the minimum and /o shows the function value on an initial point Xq. 
In our experiments, PSGA-1 and PSGA-2 exploit the step-sizes a := 5/-v/fc||s r fe|| and 
a := 0.1/y/k, respectively, in which k is the iteration counter and gu is a subgradient 
of / at Xk- The algorithms are stopped after 100 iterations. 


Table 4.1: Result summary for L22L22R and L22L1R 



problem 

dimension 

PSGA-1 

PSGA-2 

OSGA-1 

OSGA-2 

fb 

Time 

L22L22R 

n = 2000 

81.8427 

0.74 

77.1302 

0.75 

77.1285 

4.15 

77.1285 

3.08 

fb 

Time 

L22L22R 

n = 5000 

4.7561e+2 

3.67 

4.2646e+2 

3.58 

4.2640e+2 

14.09 

4.2645e+2 

7.57 

fb 

Time 

L22L1R 

n = 2000 

1.8922e+2 

0.67 

1.8827e+2 

0.61 

1.8682e+2 

3.91 

1.2367e+2 

1.21 

fb 

Time 

L22L1R 

n = 5000 

7.0679e+2 

3.72 

6.8084e+2 

3.42 

6.7887e+2 

14.20 

6.8064e+2 

7.61 


Table 4.2: Result summary for L1L22R and L1L1R 



problem 

dimension 

PSGA-1 

PSGA-2 

OSGA-1 

OSGA-2 

fb 

Time 

L1L22R 

n = 2000 

1.8981e+2 

0.69 

1.9420e+2 

0.75 

1.8671e+2 

4.04 

1.8676e+2 

2.73 

fb 

Time 

L1L22R 

n = 5000 

1.9256e+2 

3.69 

3.4612e+2 

3.57 

1.6971e+2 

14.06 

1.6995e+2 

6.83 

fb 

Time 

L1L1R 

n = 2000 

2.6713e+2 

0.76 

2.6936e+2 

0.75 

2.6536e+2 

4.27 

2.6703e+2 

3.02 

fb 

Time 

L1L1R 

n = 5000 

6.9728e+2 

3.68 

7.4536e+2 

3.57 

6.9411e+2 

14.46 

6.9687e+2 

7.46 


In Figure 1, Subfigures (a) and (b) show that OSGA-1 and OSGA-2 substantially 
outperform PSGA-1 and PSGA-2 with respect to the relative error of function values 
4 (4.21, however, they need more running time. In this case OSGA-1 and OSGA-2 
are competitive, while OSGA-1 performs better. In Figure 1, Subfigure (c) shows that 
OSGA-2 produces the best results and the others are competitive. Subfigure (d) of 
Figure 1 demonstrates that OSGA-1 attains the best results and SPGA-2 and OSGA- 
2 are competitive but much better than SPGA-1. In Figure 2, Subfigures (a) and (b) 
show that OSGA-1 and OSGA-2 are comparable but much better that PSGA-1 and 
PSGA-2. Subfigures (c) and (s) show that the best results produced by OSGA-1 and 
OSGA-2, respectively. 
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(a) L22L22R, n = 2000 (b) L22L22R, n = 5000 




(c) L22L1R, n = 2000 (d) L22L1R, n = 5000 

Fig. 4.1: A comparison among PSGA-1, PSGA-2, OSGA-1, and OSGA-2 for solving 
L22L22R and L22L1R, where the algorithms were stopped after 100 iterations. 


4.2. Image deblurring/denoising. Image deblurring/denoising is one of the 
fundamental tasks in the context of digital imaging processing, aiming at recovering 
an image from a blurred/noisy observation. The problem is typically modelled as 
linear inverse problem 


y = Ax + w, x G V, 


(4.3) 


where V is a finite-dimensional vector space, A is a blurring linear operator, a; is a 
clean image, y is an observation, and w is either Gaussian or impulsive noise. 

The system of equation (4.3) is usually underdetermined and ill-conditioned, and 
u) is not commonly available, so it is not possible to solve it directly, see [34]. Hence 
the solution is generally approximated by an optimization problem of the form 


min 


h\Ax-b\\l + \ip{x) 
xGV, 


(4.4) 


s.t. 
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(a) L1L22R, n = 2000 (b) L1L22R, n = 5000 




(c) L1L1R, n = 2000 (d) L1L1R, n = 5000 

Fig. 4.2: A comparison among PSGA-1, PSGA-2, OSGA-1, and OSGA-2 for solving 
L1L22R and L1L1R, where the algorithms were stopped after 100 iterations. 


or 


min \\Ax — 6||i + \tp(x) 

s.t. x GV, 


where tp is a smooth or nonsmooth regularizer such as <p{x) = <p{ x ) = ll^llii 

p(x) = ||:t||/tv, and <p(x) = II^Hatr- Among the various regularizers, the total vari¬ 
ation is much more popular due to its strong edge preserving feature. Two important 
types of the total variation, namely isotropic and anisotropic, see m, are defined for 
x e R mxn by 


IMI/tv 


+ 



1 

1 


y 's /A (*Gj +1 x i,j ) ~ 

A \ x m : j + l X m j\ 


= e r 1 e; - x itj \ + \ XiJ+1 - Xitj \} 

+ Er 1 ]*»+l,n - Xi,n\ + E" 1 \ x m,j+l~ x m j\, 


and 


IMUtv 
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respectively. 


The common drawback of the unconstrained problem (4.4) is that it usually gives 


a solution outside of the dynamic range of the image, which is either [0,1] or [0,255] 
for 8-bit gray-scale images. Hence one has to project the unconstrained solution to 
the dynamic range of the image. However, the quality of the projected images is not 
always acceptable. As a result, it is worth to solve a bound-constrained problem of 
the form (1.2) in place of the unconstrained problem (|4.4|), where the bounds are 


defined by the dynamic range of the images, see ® Eg BET 

The comparison concerning the quality of the recovered image is made via the 
so-called peak signal-to-noise ratio (PSNR) defined by 


PSNR = 201og 10 

\||X - X t \\F, 

and the improvement in signal-to-noise ratio (ISNR) defined by 

'\\y-x t \\ F ' 


ISNR = 20 log 10 


\X ~ X t \\F 


(4.6) 


(4.7) 


where || • ||f is the Frobenius norm, Xt denotes the mxn true image, y is the observed 
image, and pixel values are in [0,1]. 


4.2.1. Experiment with isotropic total variation. We here consider the 
image restoration from a blurred/noisy observation using the model (1.2) equipped 
with the isotropic total variation regularizer. We employ OSGA, MFISTA (a mono¬ 
tone version of FISTA proposed by Beck & Teboulle in jS]), ADMM (an alternat¬ 
ing direction method proposed by Chan et al. in eg). and a projected subgradient 
algorithms PSGA (with nonsummable diminishing step-size, see jl2j). In our imple¬ 
mentation we use the original code of MFISTA and ADMM provided by the authors, 
with minor adaptations to solve the problem form (1.2) and to stop in a fixed number 
of iterations. 

We now restore the 512x 512 blurred/noisy Barbara image. Let y be a blurred/noisy 
version of this image generated by a 9 x 9 uniform blur and adding a Gaussian noise 
with zero mean and standard deviation set to 10 _3//2 . Our implementation shows 
that the algorithms are sensitive to the regularization parameter A. Hence we con¬ 
sider three different regularization parameters A = 1 x 1(R 2 , A = 7 x 10 —3 , and 
A = 4 x 10~ 3 . All algorithms are stopped after 50 iterations. The results of our 
implementation are summarized in Table 3 and Figures 3 and 4. 

The results of Table 3 and Figure 3 show that function values, PSNR, and ISNR 
produced by the algorithms are sensitive to the regularization parameter A. However, 
the function values are less sensitive. Subfigures (a), (c), and (e) show that OSGA 
gives the best performance in terms of function values. According to subfigures (b), 
(d), and (f), the best ISNR is attained for A = 4 x 10 -3 , and the algorithms are 
comparable with each other, but OSGA outperforms the others slightly. Figure 4 
illustrates the resulting deblurred images for A = 4 x 10 -3 . 


4.2.2. Experiment with li isotropic total variation. In this subsection we 
study the image restoration from a blurred/noisy observation using the model (1.3) 
equipped with the isotropic total variation regularizer. The optimization problem is 
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(a) 8k versus iterations, A = 1 X 10 3 (b) ISNR versus iterations, A = 1 X 10 3 




(c) &k versus iterations, A = 6 X 10 4 (d) ISNR versus iterations, A = 6 X 10 4 




(e) 8k versus iterations, A = 4 X 10 


(f) ISNR versus iterations, A = 4 X 10 


Fig. 4.3: A comparison among PSGA, MFISTA, ADMM, and OSGA for deblurring 
the 512 x 512 Barbara image with the 9x9 uniform blur and the Gaussian noise with 
deviation 10" 3//2 . The algorithms were stopped after 50 iterations. The subfigures 


(a), (c) and, (e) display the relative error 6k (4.2) of function values versus iterations 


and (b), (d), and (f) show ISNR (4.7) versus iterations. 
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(c) PSGA: / = 1.4402e + 2, PSNR. = 23.77, T = (d) MFISTA: / = 1.4321e + 2, PSNR = 

2.09 23.67, T = 31.93 



(e) ADMM: / = 1.4329e + 2, PSNR = (f) OSGA:/ = 1.4294e + 2, PSNR = 23.77, T = 

23.63, T = 2.06 5.49 

Fig. 4.4: Deblurring of the 512 x 512 Barbara image with the 9x9 uniform blur and 
the Gaussian noise with deviation 10 -3 / 2 by PSGA, MFISTA, ADMM, and OSGA 
with the regularization parameter A = 4 x 10 -3 . The algorithms were stopped after 
50 iterations. 
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Table 4.3: Result summary for the isotropic total variation 



A 

PSGA 

MFISTA 

ADMM 

OSGA 

PSNR 


23.69 

23.64 

23.59 

23.74 

fb 

i x icr 2 

1.6804e+2 

1.6580e+2 

1.6705e+2 

1.6531e+2 

Time 


2.15 

30.55 

2.05 

5.60 

PSNR 


23.73 

23.66 

23.67 

23.76 

fb 

7 x lO -3 

1.5694e+2 

1.5543e+2 

1.5599e+2 

1.5500e+2 

Time 


2.09 

31.31 

2.12 

5.39 

PSNR 


23.77 

23.67 

23.63 

23.77 

fb 

4 x lO -3 

1.4402e+2 

1.4321e+2 

1.4329e+2 

1.4294e+2 

Time 


2.09 

31.93 

2.06 

5.49 


solved by DRPD1, DRPD2 (Douglas-Rachford primal-dual algorithms proposed by 
Bo? & Hendrich in [ID]), ADMM, and OSGA. 

Here we consider recovering the 256 x 256 blurred/noisy Fingerprint image from a 
blurred/noisy image constructed by a 7 x 7 Gaussian kernel with standard deviation 5 
and salt-and-pepper impulsive noise with the level 40%. The algorithms are stopped 
after 50 iterations. Three different regularization parameters A = 3 x 10 _1 , A = 
1 x 10 _1 , and A = 8 x 10 -2 are considered. The results are presented in Table 4 and 
Figures 5 and 6. 

The results of Figure 5 demonstrates that the algorithms are sensitive to the 
regularization parameter. For example, ADMM get the best function value and an 
acceptable ISNR compared with the others for A = 3 x 10 —1 , however, for A = 1 x 10 _1 
and A = 8 x 10 -2 , it behaves worse than the others. In the sense of ISNR, DRPD1, 
DRPD2, and OSGA are competitive, but OSGA get the better results. The resulting 
images for A = 8 x 10 -2 are illustrated in Figure 6, demonstrating that ADMM could 
not recover the image properly, however, DRPD1, DRPD2, and OSGA reconstruct 
acceptable approximations, where OSGA attains the best PSNR. 


Table 4.4: Results for the l\ isotropic total variation 



A 

DRPD-1 

DRPD-2 

ADMM 

OSGA 

PSNR 

fb 

Time 

3 x lO” 1 

17.95 

1.4581e+4 

1.04 

17.61 

1.4867e+4 

0.67 

18.47 

1.4476e+4 

0.81 

18.67 

1.4564e+4 

6.11 

PSNR 

fb 

Time 

1 x 10 _1 

21.87 

1.3571e+4 

1.28 

21.23 

1.3635e+4 

0.83 

20.10 

1.3799e+4 

0.76 

22.05 

1.3612e+4 

6.01 

PSNR 

fb 

Time 

8 x 10 -2 

22.26 

1.3519e+4 

1.00 

21.56 

1.3573e+4 

0.65 

18.67 

1.3879e+4 

0.79 

22.46 

1.3582e+4 

6.11 


5. Concluding remarks. This paper addresses an optimal subgradient algo¬ 
rithm, OSGA, for solving bound-constrained convex optimization. It is shown that the 
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(a) <5fc versus iterations, A = 2 X 10 1 (b) ISNR versus iterations, A = 2 X 10 1 




(c) <5*. versus iterations, A = 8 X 10 2 (d) ISNR versus iterations, A = 8 X 10 2 




(e) <5*. versus iterations, A = 6 X 10 2 (f) ISNR versus iterations, A = 6 X 10 2 


Fig. 4.5: A comparison among DRPD1, DRPD2, ADMM, and OSGA for deblurring 
the 256 x 256 Fingerprint image with the various regularization parameter A. The 
blurred/noisy image was constructed by the 7x7 Gaussian kernel with standard 
deviation 5 and salt-and-pepper impulsive noise with the level 50%. The algorithms 
were stopped after 50 iterations. Subfigures (a), (c), and (e) display the relative error 
5k (4.2) of function values versus iterations, and (b), (d), and (f) show ISNR (4.7) 
versus iterations. 
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(a) Original image 


(b) Blurred/noisy image 



(c) DRPD-1: / = 1.3571e + 4, PSNR = (d) DRPD-2: / = 1.3635e + 4, PSNR = 

21.87, T= 1.28 21.23, T = 0.83 




(e) ADMM: / = 1.3799e + 4, PSNR = (f) OSGA: / = 1.3612e + 4, PSNR = 22.04, T = 

20.10, T = 0.76 6.01 

Fig. 4.6: Deblurring of the 256 x 256 Fingerprint image using DRPD1, DRPD2, 
ADMM, and OSGA with the regularization parameter A = 7 x 10 -3 . The algorithms 
were stopped after 50 iterations. The blurred/noisy image was constructed by the 
7x7 Gaussian kernel with standard deviation 5 and salt-and-pepper impulsive noise 
with the level 50%. 
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solution of OSGA’s subproblem has a piecewise linear form in a single dimension. Af¬ 
terwards, we give two iterative schemes to solve this one-dimensional problem, where 
the first one solves OSGA’s subproblem exactly and the second one inexactly. In par¬ 
ticular, the first scheme uses the piecewise linear solution to translate the subproblem 
into a one-dimensional piecewise rational problem. Subsequently, the optimizer of the 
subproblem is founded in each interval and compared to give the global solution. The 
second scheme substitutes the piecewise linear solution into the subproblem and give 
a one-dimensional nonlinear equation that can be solved with zero finders to give the 
optimizer. Numerical results are reported showing the efficiency of OSGA compared 
with some state-of-the-art algorithms. 

Acknowledgement. We would like to thank Radu Bot and Min Tao for making 
their codes DRPD-1, DRPD-2, and ADMM available for us. 
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